This study investigates particulate matter (PM 2.5) air quality data alongside four Social Vulnerability Index (SVI) metrics across North Carolina counties and visually compares their proximity against retired and operating Power Plant locations. Environmental justice is an ever-concerning issue in America, and with the sudden increase in electricity consumption from the rise of data centers, it is important as ever to make sure injustices are not overlooked. The origins of the environmental justice movement stem from Warren County, North Carolina when an Africa-American community was chosen to be the location of a hazardous waste landfill, sparking national conversations about systemic environmental inequities. This historical context is the reason North Carolina was chosen as the area of focus for this research.
The study focuses on coal-burning power plants in particular because they are a major source of air pollution, specifically the very dangerous pollutant PM 2.5. PM 2.5 from coal combustion is rich in sulfur dioxide, black carbon, and metals which can enter a human’s lungs and bloodstream leading to conditions such as cancer, asthma, and even premature death. The NIH estimated from a study that for every 1 μg/m3 increase in coal PM 2.5, mortality in the studied regions increased by 1.12%. Given North Carolina’s unsettling environmental justice history, we seek to explore the connections between coal-burning power plants, amounts of PM 2.5 air pollution, and social vulnerability indices to potentially reveal disparities in air quality and publlic health impacts.
Possible Questions:
Is there a correlation between PM 2.5 concentrations in North Carolina and the proximity of coal-burning power plants?
Do counties with higher Social Vulnerability Indexes (SVI) have higher concentrations of PM 2.5?
Are there more coal-burning powerplants in these counties?
Are there more clean energy powerplants in lower SVI counties?
Is there a three-way relationship between coal-burning power plants, SVI, and PM 2.5 concentrations in North Carolina?
This research consists of 5 different datasets: Power Plants, North Carolina Retired Generators, SVI Indexes, and Particulate Matter 2.5 Air Quality.
Power Plants and Retired Generators:
The Power Plant data set was collected from the open data site of the
Geospatial Management Office of the U.S. Department of Homeland
Security. The shapefile was created for the Homeland Infrastructure
Foundation-Level Database and the Energy modeling community at large.
This data contains electric power plants around the United States
including the following plant types: hydroelectric dams, fossil fuels
(coal, natural gas, or oil), nuclear, solar, wind, geothermal, and
biomass. The man classifications that are used in this study are plant
name, state location of plant, status of plant (Operating or Retired),
primary fuel of plant, and geographic location.
The GeoJSON option of this data was copied and pasted into R Studio to bring in this dataset. This research narrowed the scope down to plants located in North Carolina. Then, primary fuel was narrowed to narrow out the plants that would not produce PM 2.5. According to EPA’s EGrid information, the abbreviations of plant primary fuel’s that are found within this dataset represent the following: BIT (blast furnace gas), AB (agricultural byproduct), BLQ (black liquor), DFO (distillate fuel oil, light fuel oil, FO2, diesel oil), LFG (landfill gas), NG (natural gas), OBG (digester gas, methane, and other biomass gases), SLW (sludge waste), WDS (wood, wood waste solid), WH (waste heat), SUN (solar), WND (wind), WAT (water), MWH (electricity), and NUC (nuclear). These plants were broken up into four categories: BIT only plants, other relatively moderate PM producing plants (WDS, SLW, BLQ, AB, and DFO), very low PM producing plants (LFG, NG, OBG, and WH), and a combination of BIT plants and the moderately producing PM plants. This last data frame was filtered for operating status of either operating or retired. Then, a second data set was merged to the retired plants data frame to add the column of retirement years. This data set is from U.S. Energy Information Administration ( https://www.eia.gov/electricity/data/eia860m/ ) and contains all generators retired in North Carolina as of October 2024. This was merged into the retired plants data frame by Plant Code and Plant ID.
Social Vulnerability Index: The CDC’s Social Vulnerability Index (SVI) dataset provides bi-annual data on indices that measure social vulnerability. These indices are organized into four key themes, each representing the average of various indicators: Theme 1 (Socioeconomic Status), Theme 2 (Household Composition and Disability), Theme 3 (Minority Status and Language), and Theme 4 (Housing Type and Transportation). Additionally, the dataset includes an overall summary ranking variable, which aggregates the values from these themes to provide a comprehensive measure of social vulnerability.
Geodatabase data was pulled from the CDC website (https://www.atsdr.cdc.gov/place-health/php/svi/svi-data-documentation-download.html) in order to retain spatial information for mapping. There was an issue with importing 2014 geodatabase (GDB) data, so 2014 data was imported first as a CSV file. Data was filtered to include only counties in North Carolina (using STATEFP == 37). A join was performed between the NC Counties spatial data frame and the 2014 SVI data using the GEOID column from the spatial data and the FIPS column from the SVI data. Subsequently, SVI data for North Carolina counties from 2016, 2018, 2020, and 2022 were read as geodatabases as these were able to work straight from the website. To have dataframes that also allowed for GLM and linear regression analysis, additional non-spatial dataframes for SVI were created by using st_drop_geometry() to drop the spatial component of the SVI data.
Next, the code focuses on creating consolidated datasets for analysis. Four separate themes (Theme 1, Theme 2, Theme 3, and Theme 4) and an aggregate measure (RPL_THEMES) are processed by selecting the relevant theme columns from each dataset (2014, 2016, 2018, 2020, and 2022) and performing inner joins using FIPS as the key. The resulting data frames provide a longitudinal view of each theme’s values across the years for each county. Column names are updated to reflect the year associated with each column. Finally, the code consolidates the themes and aggregate measures into respective data frames for further analysis, preparing the data for generalized linear model (GLM) analysis or other statistical approaches.
Particulate Matter:
North Carolina Counties: The North Carolina Counties dataset was primarily uploaded to incorporate spatial information for the above datasets. This integration enables the mapping of data, which is a critical step for visualization and spatial analysis. This was used for linking the powerplant and SVI data with geographic boundaries, which helped facilitate the creation of maps that visually represent social vulnerability and powerplant locations across North Carolina counties, aiding in identifying spatial patterns.
| Variable | Description | Units |
|---|---|---|
| PLANT_CODE | Power Plant Code ID | Character |
| NAME | Name of Power Plant | Character |
| STATE | State Plant is | |
| Located | Character | |
| STATUS | Operating Status of Plant | Character |
| COUNTY | County Plant is Located | Character |
| COUNTYFIPS | County | |
| FIPS | Character | |
| PRIM_FUEL | Primary Fuel of Plant | Character |
| LATITUDE | Latitude of Plant | Double-precision decimal number |
| LONGITUDE | Longitude of Plant | Double-precision decimal number |
| Variable | Description | Units |
|---|---|---|
| FIPS | County FIPS Code | Character |
| STATE | State | Character |
| RPL_THEME1 | Theme 1 Percentile Ranking | Double-precision decimal number |
| RPL_THEME2 | Theme 2 Percentile Ranking | Double-precision decimal number |
| RPL_THEME3 | Theme 3 Percentile Ranking | Double-precision decimal number |
| RPL_THEME4 | Theme 4 Percentile Ranking | Double-precision decimal number |
| RPL_THEMES | Overall Summary Ranking Variable | Double-precision decimal number |
| Variable | Description | Units |
|---|---|---|
| Date | Date Data was Daten | Character |
| Site ID | ID Number of Site Location | Number |
| Daily Mean PM2.5 Concentration | Daily Mean of PM 2.5 Concentration | Number in units of ug/m^3 |
| Local Site Name | Name of Site | Character |
| State | State of Site | Character |
| County | County of Site | Character |
| Site Latitude | Site Latitude | Double-precision decimal number |
| Site Longitude | Site Longitude | Double-precision decimal number |
| Variable | Description | Units |
|---|---|---|
| Plant.Name | Name of Plant | Character |
| Plant.ID | Plant ID | Integer |
| Retirement.Year | Year Plant Retired | Integer |
| Variable | Description | Units |
|---|---|---|
| STATEFP | State Code, 37 for NC | Character |
| COUNTY | County Name | Character |
| geometry | Geometry | sfc_MULTIPOLYGON |
Below is a map of North Carolina Coal-Burning Power Plants with BIT Primary Fuel Source
Below is a map of North Carolina Power Plants including BIT plants and 5 other Primary Fuel Types (WDS, SLW, BLQ, AB, DFO)
Below is a map of Operating vs. Retired North Carolina Power Plants. As shown, all of the retired plants are BIT primary fuel plants, showing that the highest PM emitting plants are being retired over other types.
Below is an interactive map of all these maps side by side.
These maps provide a visualization of PM2.5 concentration levels and SVI percentiles across North Carolina over several two-year time periods (2013–2014, 2015–2016, 2017–2018, 2019–2020, and 2021–2022). The maps highlight distinct regional trends in both air quality and social vulnerability, illustrating how these factors evolve over time and interact geographically.
In general, the PM2.5 concentration levels demonstrate a statewide downward trend, with lower concentrations observed in later years compared to earlier periods. However, the western part of the state consistently shows areas with higher PM2.5 concentrations, particularly near counties where operating power plants and other industrial facilities are located. For instance, counties near active BLQ and DFO units in western North Carolina display elevated PM2.5 levels in the earlier years, which may reflect their proximity to emissions sources.
The SVI percentiles reveal clusters of higher social vulnerability in the eastern part of the state, particularly in counties where PM2.5 monitors are sparse. These areas with higher social vulnerability often overlap with historically underserved communities that may be at greater risk due to a lack of air quality monitoring and mitigation measures. Some western counties, such as Jackson and Swain, show increasing SVI percentiles alongside worsening PM2.5 levels in certain periods. Counties like Cleveland and Mecklenburg reflect notable changes over time. Cleveland County, home to a plant in Shelby, shows increased social vulnerability after 2014, coinciding with higher environmental risks. Mecklenburg County exhibits worsening PM2.5 levels and an increased SVI percentile by 2022. Montgomery County and Halifax County also experience slight increases in PM2.5 levels and SVI values, likely influenced by the presence of nearby industrial units. Interestingly, Hyde County in the eastern region exhibits a rising SVI percentile despite a general reduction in PM2.5 levels over the years, suggesting that social factors driving vulnerability may not be directly linked to air quality improvements. The maps also highlight scale adjustments in PM2.5 measurements, with generally lower concentration levels in later years, reflecting statewide efforts to reduce particulate pollution.
These findings underscore the complex relationship between air quality and social vulnerability, highlighting the need for targeted interventions in areas facing persistent social and environmental challenges. While the western part of the state, characterized by lower SVI values, appears to have higher PM2.5 concentrations, it is crucial to consider the lack of PM2.5 monitors in the eastern regions where SVI tends to be higher. This monitoring gap may obscure evidence of potentially rising PM2.5 concentrations in areas with higher social vulnerability, underscoring the need for improved air quality surveillance in underserved regions.
##GLMs/Linear Regression:
Figure X : Analysis of corrplot: When looking at the Correlation Ellipse
Plot, we are looking at (i) the elliptical shape, (ii) the
direction/slope of the ellipse, and (iii) the intensity of the ellipse’s
color to examine pairwise correlations. According to our correlation
plot, the strongest correlations across 2014 to 2022 appear for the
following variable pairs: (a) the aggregate of all four SVI themes and
SVI theme 1, and (b) the aggregate of all four SVI themes and SVI theme
4, since the narrower ellipse shapes and darker ellipse colors suggest a
very high correlation among these pairs. This makes sense since SVI
theme 1 is one of the four themes that make up the SVI aggregate.
Although the elliptical shapes and color intensities suggest strong
correlations between the other SVI themes, individually, and the SVI
aggregate, it is not as strong as the two pairs of variables first
mentioned.
Figure X: Further correlation plot analysis: When looking at the mixed
correlation plot, which combines the ellipses with the exact correlation
values, we can confirm our prior observations when we see the
correlation values for (a) the SVI aggregate and SVI theme 1 and (b) the
SVI aggregate and SVI theme 4 are 0.85 and 0.78, respectively.
summary(multi.lin.reg)
##
## Call:
## lm(formula = meanPM ~ theme1_value + theme2_value + theme3_value +
## theme4_value + Number_of_Plants, data = combined_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3413 -0.8170 0.0277 0.8120 2.6377
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.3234 0.6025 10.496 1.9e-14 ***
## theme1_value -2.6377 1.0514 -2.509 0.0153 *
## theme2_value 1.7538 0.9162 1.914 0.0611 .
## theme3_value 0.5452 0.9852 0.553 0.5824
## theme4_value 1.4891 1.0901 1.366 0.1778
## Number_of_Plants 0.1579 0.1103 1.432 0.1581
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.457 on 52 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.2195, Adjusted R-squared: 0.1445
## F-statistic: 2.925 on 5 and 52 DF, p-value: 0.02115
Analysis of multi linear regression/AIC: Using the Akaike Information Criterion (AIC) to evaluate the fit of a multiple regression model to our data, we see that the starting AIC value is 50.54. When removing the variable ‘theme_agg_value’, the AIC increases slightly. After then removing the variable ‘Number_of_Plants’, the AIC increases slightly once again. Because we want the smallest AIC value, this suggests that the inclusion of both these variables better explains the variation in mean PM2.5.
However, a very low R^2 value, 0.08383, suggests that despite their inclusion, the SVI aggregate and number of power plants do not explain much of the variance in mean PM2.5 values from 2014 to 2022. Because the model only explains 8.38% of the data, the multi linear regression model may not provide a good fir for our data, and we should look to other factors that might better explain the variance in mean PM2.5 from 2014 to 2022.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 70 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 70 rows containing missing values or values outside the scale range
## (`geom_point()`).
## `geom_smooth()` using formula = 'y ~ x'
Figure X and Figure X: Analysis of scatterplots: After plotting the
relationships of (i) mean PM2.5 and the number of power plants from 2014
to 2022 and (ii) mean PM2.5 and the SVI aggregate from 2014 to 2022, the
resulting scatterplots confirm that there appears to be no significant,
or there is a very miniscule, relationship between PM2.5 and each of the
number of power plants and the SVI aggregate for the years 2014 to
2022.
“Environmental Justice History.” U.S. Department of Energy, www.energy.gov/lm/environmental-justice-history.
Doctrow, Brian. “Deaths associated with pollution from coal power plants.” National Institutes of Health, 12 Dec. 2023, www.nih.gov/news-events/nih-research-matters/deaths-associated-pollution-coal-power-plants.
“EGrid .” Environmental Protection Agency, www.epa.gov/egrid/code-lookup.